fix issue with elasticsearch extension to respect limit and minReleva…#935
fix issue with elasticsearch extension to respect limit and minReleva…#935basyonic wants to merge 15 commits intomicrosoft:mainfrom
Conversation
…nce parameters when calling GetSimilarListAsync API
|
@microsoft-github-policy-service agree |
|
|
||
| //adjust min score for cosine similarity | ||
| //https://www.elastic.co/guide/en/elasticsearch/reference/current/knn-search.html#knn-similarity-search | ||
| float adjustedMinSimilarityScore = (float)(2 * minRelevance - 1); |
There was a problem hiding this comment.
This formula transforms from score to cosine similarity, however internally ES works with score.
I believe this should be the inverse, e.g. float adjustedMinSimilarityScore = (float)((minRelevance + 1) / 2)
By the way, the formula (2 * minRelevance - 1) should be used in line 253, replacing hit.Score with the actual cosine similarity. The value returned is incorrect and doesn't match the similarity returned by other memory connectors.
There was a problem hiding this comment.
As per https://www.elastic.co/guide/en/elasticsearch/reference/current/knn-search.html#knn-similarity-search
default similarity metric is cosine and I didn't see something defines another similarity metric in the plugin code.
From my tests I found this score adjustment for cosine returns correct results.
please let me know if I'm missing something here or if you have another proposed tests to include..
A similarity value. This value determines the similarity metric used to score documents based on similarity between the query and document vector. For a list of available metrics, see the [similarity](https://www.elastic.co/guide/en/elasticsearch/reference/current/dense-vector.html#dense-vector-similarity) parameter documentation. **The similarity setting defaults to cosine**.
|
|
||
| var resp = await this._client.SearchAsync<ElasticsearchMemoryRecord>(s => | ||
| s.Index(index) | ||
| s.Index(index).Size(limit) |
There was a problem hiding this comment.
qd.k(limit) already sets a limit, isn't this redundant? should we remove the limit param in qd.k(limit)?
There was a problem hiding this comment.
kNN search adds k matching documents to the search. So, if you set k=20, then it finds 20 matches. Then size takes the top scoring documents from these and returns them.
From my tests in real dataset. it was always returning 10 regardless of the k(limit) value. I think this is the default value of size parameter
extensions/Elasticsearch/Elasticsearch.FunctionalTests/Additional/KernelMemoryTests.cs
Outdated
Show resolved
Hide resolved
extensions/Elasticsearch/Elasticsearch.FunctionalTests/Additional/KernelMemoryTests.cs
Outdated
Show resolved
Hide resolved
extensions/Elasticsearch/Elasticsearch.FunctionalTests/Additional/TestsHelper.cs
Outdated
Show resolved
Hide resolved
extensions/Elasticsearch/Elasticsearch.FunctionalTests/Additional/KernelMemoryTests.cs
Outdated
Show resolved
Hide resolved
extensions/Elasticsearch/Elasticsearch.FunctionalTests/Additional/KernelMemoryTests.cs
Outdated
Show resolved
Hide resolved
…nal/KernelMemoryTests.cs
…nal/KernelMemoryTests.cs
…nal/TestsHelper.cs
|
Pausing this while we wait for SK Vector Store relase and integration. |
|
Closing as part of repository maintenance - no further action planned on this issue. |
…nce parameters when calling GetSimilarListAsync API
Motivation and Context (Why the change? What's the scenario?)
Fixing issue https://github.com/microsoft/kernel-memory/issues/934
High level description (Approach, Design)
using ElasticSearch Client .Size(limit) clause during query.
using .Similarity(min score) to fix minRelevance (after adjusting the score value based on cosine similarity as per https://www.elastic.co/guide/en/elasticsearch/reference/current/knn-search.html#knn-similarity-search)